Random Sampling from Databases - A Survey

نویسنده

  • Frank Olken
چکیده

This paper reviews recent literature on techniques for obtaining random samples from databases. We begin with a discussion of why one would want to include sampling facilities in database management systems. We then review basic sampling techniques used in construct-join are then described. We then describe sampling for estimation of aggregates (e.g., the size of query results). Here we discuss both clustered sampling, and sequential sampling approaches. Decision theoretic approaches to sampling for query optimization are reviewed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Sampling from Database Files: A Survey

In this paper we survey known results on algorithms, data structures, and some applications of random sampling from databases. We first discuss various reasons for sampling from databases, and for inclusion of sampling as a DBMS operator. We consider basic sampling algorithms, sampling from trees, sampling from hash tables, and auxiliary memory resident index information to facilitate sampling.

متن کامل

Random Sampling from Databases

Random Sampling from Databases by Frank Olken Doctor of Philosophy in Computer Science University of California at Berkeley Professor Michael Stonebraker, Chair In this thesis I describe e cient methods of answering random sampling queries of relational databases, i.e., retrieving random samples of the results of relational queries. I begin with a discussion of the motivation for including samp...

متن کامل

Sampling for Web Surveys

Web surveys are frequently based on samples drawn from panels with large amounts of nonresponse or haphazard selection. The availability of large-scale consumer and voter databases provides large amounts of auxilliary information for both panelists and population members. Sample matching, where a conventional random sample is selected from a population frame and the closest matching respondent ...

متن کامل

Simple Random Sampling from Relational Databases

Sampling is a fundamental operation for the auditing and statistical analysis of large databases. It is not well supported in existing relational database management systems. We discuss how to obtain samples from the results of relational queries without first performing the query. Specifically, we examine simple random sampling from selections, projections, joins, unions, and intersections. We...

متن کامل

A Bayesian Nominal Regression Model with Random Effects for Analysing Tehran Labor Force Survey Data

Large survey data are often accompanied by sampling weights that reflect the inequality probabilities for selecting samples in complex sampling. Sampling weights act as an expansion factor that, by scaling the subjects, turns the sample into a representative of the community. The quasi-maximum likelihood method is one of the approaches for considering sampling weights in the frequentist framewo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994